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ABSTR.\CT  J 

'Dirichlet  mixed  models  fmd  wide  application./;  Estimation- is  usually  achics  ed  through  the 
method  of  moments.  Here  we  present  an  iterative'  hybrid  algoritlim  for  obtaining  thb'maximum 
likelihood  estimate  employing  both  modified  Newnon-Raphson  and  E-.Mv  methods.  This  sue-  \ 
cessful  MLE  algonthm  enables  calculation  of  a  jackknife  MLE.  Simulation  comparison  of  the 
three  estimates  is  provided.  The  MLE  substantially  improves  upon  the  moments  estunator  par¬ 
ticularly  with  increasing  dimension^  ,The  jackknife  NILE  in  turn  offers^dramatic  improvement 
over  the  MLE.  i  , ,  ^  .  '■ 

1.  LNTRODL  CTION  /  / 

V/  ■  .  ■ 


Mixture  distributions  afford  a  flexible,  rich  class  of  models.  A  general  deflnition  of  a  mixture 
distribution  which  appears  in  Eventt  and  Hand  (1981)  goes  as  follows; 

Let  gix  |0)  be  a  d-dimcnsional  probability  density  function  with  respect  to  some  measure  p 
on  a  m-dimensional  parameter  vector  0  and  let  H{  0  )  be  a  m-dimensional  cumulative  distribution 
function.  Then 

Ax)  ^  g{x\Q)dH(0)  (1.1) 

is  called  a  mixture  density.  //  is  called  the  mixing  distribution.  If  H  is  discrete  and  assigns  positive 
probability  to  only  a  finite  number  of  points  (  0, ; ;  =  1 . c)  then  we  have  a  finite  mixture  where 

Ax)  =  i  H(0i)  •  g{x\Qi)  (1.2) 

i=  1 

Througliout  the  literature  on  mi.xture  distributions  the  goal  has  been  to  estimate  H  assuming 
a  parametric  form  g.  Finite  mixtures  date  back  to  Pearson  (1894)  who  attempted  to  estimate  the 
five  parameters  in  a  mixture  of  two  normal  distributions.  Detailed  discussions  of  mixtures  can 
be  found  m  fittcrington,  ct  al.  (1985)  and  Everitt  and  Hand  (1981). 

Identifiability  of  the  mixture  model  is  a  crucial  issue.  Tcicher  (1961.1963)  was  the  first  to 
give  a  definitive  answer  to  this  problem.  By  deflnition,  a  class  D  of  mixtures  is  said  to  be  iden¬ 
tifiable  if  and  only  if  for  all/(A')  e  D  the  equality  a.c.  p  of  the  two  representations: 

J©  5(^  f  0)  <f//*(0)  =  Jq  g{x  I  0)  r///(0)  (1.4) 

implies  that  //’(O)  =  /f(0). 

We  will  assume  that  H  itself  is  from  a  parametric  family  indexed  by  a  and  that  the  goal  is 
to  estimate  a  based  on  observations,  x,  from 

y(jc|a)  =  |0,?(.v|O)  •  £///(0|n)  (1.5) 

Discrete  y(A- 1  a)  arc  more  commonly  referred  to  as  compound  distributions.  Here  we  will 
consider  A"  to  be  a  vector  of  counts  and  0  (which  will  be  a  vector  within  the  unit  simplex)  will 
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characterize  the  probabilities  that  a  particular  count  within  the  A’- vector  will  be  incremented. 
Models  for  g  include  the  multinomial  distribution  having  density 

\  ii  fc  k 

g^{x^X2,.-  ,Xi^)  =  —  n  0f‘  ,  1  Xi  =  n,  s  0;  =  1  (1.6) 

n  Xi\  ' 

1 

and  the  negative-multinomial  distribution,  having  density 

k-  1 

(n  +  I  Xj  -  1)!  _ 

g0ix,x^,...,x^_,)  =  - -  *n’o;^  (1  "I'Oj  <  1  (1.7) 

(n-l)!  n  Jf/ 
z=i 

In  both  the  multinomial  and  negative  multinomial  cases  the  random  variable  is  defined  by 
a  particular  stopping  rule  on  the  generalized  Bernoulli  trials.  For  the  multinomial  case  the  ran¬ 
dom  variable  is  observed  when  n  generalized  Bernoulh  trials  are  completed.  For  the  negative 
multinomial  case  the  random  variable  is  observed  when  a  predesignated  cell  fills  to  size  n  . 
More  broadly  for  a  specified  stopping  rule  on  the  generalized  Bernoulli  trials  we  say  that  the  re¬ 
sulting  random  \  cctor,  X  of  observed  counts  for  the  k  cells,  follows  a  general  occupancy  distrib¬ 
ution.  Examples  of  other  potentially  interesting  stopping  rules  are  :  (1)  Sample  until  both 
X,  ^  r,  and  A)  ^  r,  >  ,  (2)  Sample  until  either  X,  =  r,  or  X^  =  fj.  Generally,  if  X  is  an  outcome 
in  the  sample  space  of  a  general  occupancy  model 

g(,(x)  =  c(e)A(x)  n  0f'  (1.8) 

<■=  1 

The  natural  conjugate  choice  for  //  in  tliis  context  would  be  the  Dirichlct  distribution, 
nio;) 

Aa  (Q)  =  -T~  '  ’o"'  ~  'I  (1  -  ^  I  'e/*  ~  ’  ,  a;>0,/=l . k  (1.9) 

nr(a,)  j=^ 

i=  1 

The  assumption  of  the  mixing  distribution  being  conjugate  buys  simplicity  of  form.  But  in  addi¬ 
tion  Dalai  and  Hall  (1983)  point  out  that  arbitraiy  mixture  distributions  can  be  satisfactorily  ap¬ 
proximated  by  considering  mixtures  of  natural  conjugate  distributions.  In  the  present  case  we 
obtain  the  Diriclilct  (or  compound)  -  multinomial  distribution 


fd  (-^1 1  •••  I 


n  jf,! 

1=  1 


r(  I  a^) 
/=  1 

n  r(ai) 

/=  1 


(*nV(jCy  +  Qy))  r(n  -  ^  S  'jly  + 
y=i _ _ 

r(n  +  I  a,) 

1  =  1 


*  —  I 

n-  Z  Xj  -  I 

k-l  >=' 

n  n  (a,  +  r)  n  (a^  +  r) 

_  n!  /=lr  =  0  r  =  0 

k  n  —  I 

n  Xj!  n  (U]  +  02  +  +  o^  +  r) 

(=1  r=0 

and  the  Dirichlet  (or  compound)  -  negative  Multinomial  Distribution 


(1.10) 
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(1.11) 


k-  1 

{n+  1  Xj-  1)! 

J=  > 

k 

r(  I  U:) 
1=  1 

(  n  r(Ay  -i-  ay))  r(A!  -t-  a^) 

7=  1 

t!-  1 

(n-  ly.  n  xf. 

7=1 

n  Ra/) 
i=  1 

r(/i  +  *  I  ’xy  -(-  i  a,.) 
7=1  /=! 

k-) 

(n+  1  Xj  -  1)! 
7=1 

A-  - :  A,  - 

n  n 

1=1  r  = 

1  n~  ) 

(a,.  +  r)  U  (a^  +  r) 

0  r=  0 

*-1 

(n  -  1)!  n  a;,! 
7=  1 

*  ~  1 

n  +  1  Xj  — 

;=  1 

n 

r  =  0 

1 

(n,  +  02  +  +  Ojt  +  r) 

Extension  to  a  Dirichlct  general  occupancy  distribution  is  apparent. 

With  regard  to  identifiability  let  Q  be  the  collection  of  all  possible  distinct  events  in  the 
sample  space  and  let  A'(n)  be  the  cardinality  of  .  For  any  k  dimensional  a-vector  we  can 
construct  S'(Q)  equations  which  describe  the  probabilities  for  all  simple  esents  in  Q.  Since  all 
of  the  probabilities  must  sum  to  1  wc  will  have  \(i})  —  1  independent  equations  and  k  unknowns. 
If  wc  have  at  least  as  many  equations  as  wc  have  unknowns,  wc  cannot  choose  any  other  a-vector 
that  will  generate  the  same  probabilities.  ITicreforc,  the  condition  for  an  identifiable  Dirichlet 
mi.xture  will  be 

{S{Q)  -  1)  -  k  ^  0  (1.12) 

Hence,  the  Diriclilet  negative  multinomial  is  always  identifiable,  while  the  Dirichlet  multinomial 
is,  provided  n  >  1. 

Applications  of  these  two  mixture  models  are  extensive  in  the  literature.  In  particular  see 
Lcckenby  and  Kishi  (1984),  Rust  and  Leone  (1984),  Kalwani  (1980),  and  Mosimann  (1962,1963). 


2,  ESTIM.ATION  APPROACHES 


Method  of  moments  estimation  is  most  commonly  used  in  the  Dirichlct  mixed  models  (sec 
e.g.  Mosimann  (1962.1963),  Johnson  and  Kotz  (1969j  ).  Maximum  likelihood  estimation  re¬ 
quires  a  difficult  numerical  maximization  and  has  been  studied  primarily  in  the  Beta  mixed  (k  =  2) 
case.  (See  e.g.  Griffiths  (1973),  Smith  (1983).  Williams  (1975)  ).  Computation  of  MLE's  is  the 
primary  issue  of  section  3.  Effective  computation  of  the  .MLE  enables  us  to  propose  a  jackknifed 
MLE  as  a  third  choice.  Mean  square  error  behavior  of  the  jackknife  estimator  is  extremely 
promising. 

We  first  review  the  method  of  moments  estimator.  First  and  second  moments  associated  with 
the  Dirichlct  multinomial  in  ( 1 .9)  are 


EXi  = 

la 


(2.1) 


VarXi  = 


n  En 
I  +  la 


{n 


la 


(2.2) 


n  -t-  In 
I  +  la 


a,  O;' 

^''’ija'Za^ 


(2.3) 


To  obtain  the  moments  estimates  we  would  have  the  relationship  n  a,  /  la  =  x..  Since  there  is 
no  constraint  on  the  a,  s  ,  we  need  an  extra  equation  There  is  no  unique  criteria  to  determine 
this  extra  equation  and  therefore  can  be  considered  .an  ad  hoc  choice. 
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Mosimann  (1962)  noticed  that  the  covariance  structure  for  the  Dirichlct-multinomial  is  just 

Za 


a  constant  times  the  covariance  matrix  for  a  multinomial  with  parameters  (  ... ,  )  . 

Za  Vfi 


^DM 

=  /»  .  V 

■f*') 

(2.4) 

where  p^' 

=  ^  ,  C  = 

77  +  la 

Za 

1  +  Za 

Using  tills  relationship  the  generalized  variance  would  be  |Z 
to  avoid  singular  matrices,  (k-1)  terms  of  the  covariance  matrix  are  used. 


DM' 


In  order 


For  a  sample  (  A”,,, ... ,  A’,* )  t  =  l,...,m  (i.c.  m  replications)  from  this  distribution,  Mosimann 
showed  that  the  M.Mt's  arc  of  the  form 

A 

«  Xjin  -  c) 


k-\ 


n(c  -  1) 


-DM'  _ 


(  n  Xj)  (n  -  I  Xj):n 
z=i  ;=i 

Here  jSl  is  the  detcmiinant  of  the  sample  covanance  matrix 

=  =  V  ~ 

r=t 


—  7=  1 . k 

1) 

(2.5) 

1.91 

1  k- \ 

(2.6) 

S  = 


m  (X,i  -  Xi)(x,j  -  Xj) 


t=  1 


m 


i  =  1,...  -  1 

i,J=  1, ...  ,k  -  1 


(2.7) 

(2.8) 


In  order  for  a  to  be  feasible,  1  <  c  <  n.  If  this  condition  is  not  satisfied  then  we  will  say  that 
the  MMH  does  not  exist  for  this  mixture  case. 

First  and  second  moments  associated  with  the  Dirichlct  negative  multinomial  in  (1.1 1)  are 


EXj  =  77 - 

^  Ot  -  1 


Ok  >  1 


,.  77  +  a*  -  1 

Fa7A:  = - - - 

^  -  2  a*  -  1  Qk-  \ 


a,  a,  +  Qi,  -  1 

.  )'  a*  >2 


Cav(A}.A'-)  = 


n  +  at  —  1 


(77- 


ajk  “2  '  ojt  -  1  a*  -  I 

j,j’  =  1 


). 


>  2 


(2,9) 

(2.10) 

(2.11) 


Again  Mosimann  (1963)  exploited  the  covariance  structure  of  the  negative  multinomial  and 
Dirichlct-negativc  multinomial  by  obscr\'ing  that 


-DNM 


=  r  .  V 


to  yield 

''  ■*/  (r?  +  c) 

a,-  —  — - n - 

"  c  -  1 


NM(p, 

where  p'j  =  Oj ,  y  =  1, ...  -  1  ,  p'k  =  Ok  -  ^ 

j  =  1, ...  ,k  -  1 


(2.12) 
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(2c  +  n  —  1) 

—T,  '  “ 

C  -  I 


(2,13) 


For  some  reason  Mosimann  did  not  present  the  generalized  variance  ratio  estimate  of  c  for  the 
DNM  case.  Instead  he  chose  c  by  ignoring  the  overall  covariance  structure  and  equated 


(2.14) 


An  alternative  estimate  consistent  with  the  Dirichlet  Multinomial  approach  is  one  that  accounts 
for  the  full  covariance  structure.  Therefore  we  use 


-  1  _  l^/)VA/l 


'“AA/l 


_ m _ 

*-l_  k-]_ 

(  n  Xj)  (n  +  I  Xj]'n 
7=1  7=1 


(2.15) 


We  should  remember  that  this  choice  of  estimate  is  ad  hoc.  However,  we  will  call  the  resulting 
estimate  the  MMC,  even  though  it  is  not  uniquely  determined.  Again,  if  c  1  we  will  say  that 
the  MME  does  not  exist. 


There  has  been  no  study  up  to  this  time  which  evaluates  the  performance  of  these  estimates 
in  both  the  D.M  and  DNM  cases.  Since  the  estimates  in  both  cases  arc  rational  functions  of 
consistent  estimates  (DNM  case  a*  >  2  )  ,  they  too  arc  consistent  (Slutsky  s  theorem).  Results 
of  simulation  studies  will  be  pre.scnted  in  section  4. 

With  regard  to  maximum  likelihood  estimation  for  multinomial  and  negative  multinomial 
distributions,  .MLE's  and  MME's  are  the  same.  In  the  mixture  case,  MLE's  cannot  be  written 
out  in  closed  form.  In  order  to  proceed  with  maximum  likelihood  estimation  it  is  sometimes 
more  convenient  to  write  the  compound  distribution  of  interest  under  a  particular 
reparametrization.  The  reasoning  beliind  this  is  that  we  will  ultimately  need  an  iterative  procedure 
to  obtain  the  MLE.  If  we  can  choose  a  parametrization  such  that  the  parameter  estimates  do  not 
N  ary  much  in  the  region  of  best-fitting  models',  then  we  will  have  a  more  efficient  iterative  pro¬ 
cedure.  These  new  parameters  are  c^ed  stable  parameters.  Ross  (1970)  di.scusses  maximum 
likelihood  in  this  context. 


In  the  Diriclilct  multinomial  model  w-e  reparametrize  to 


,  0  (i)  “  (  „  y  )■ 


*  —  1 

n-  I  -  1 

/c-1-*,  -  '  .'=1  k-] 

n  n  (p,-  +  rO)  n  (i  -  i  p  +  r0) 

/  =  I  r  =  0 _ r=0 _ J=  1 _ 

”n'(i  +  rO) 

r=0 


0  <  Pi  <  1,  /•  =  1,...  ,k  -  1  ,  0  >  0 


(2.16) 


w'here  p^  =  i  =  1 . k  -  1  and  0  =  (2.17) 

Under  this  parametrization  p,  can  be  thought  of  as  the  mean  parameter  of  the  original  p,  ,  and  0 
can  be  thought  of  as  a  shape  parameter.  Griffiths  (1973)  seems  to  have  been  the  first  to  use  this 
representation  for  the  (k  =  2)  case.  Under  this  parametrization  ,  q(£)  is  exactly  a  multinomial 

A  —  I 

with  parameters  (  p, pj  ....  1  -  I  )  .  'Thus  departures  from  0  =  0  suggest  departures  from 
pure  multinomial  variation.  0  in  this  setting  is  sometimes  called  an  overdispersion  parameter. 
From  expression  (2.4)  we  notice  that 


_  nO  1 
0  -t-  1 


(2.18) 
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so  when  0  =  0  ,  c  =  1 . 


Skcliam  (19-48)  examined  the  log  of  expression  (2.26)  for  the  ca.se  of  (k  =  2)  and  took  the 
natural  'derivati\e  log  likelihood'  approach.  1  le  proposed  a  recursive  procedure  through  the  de¬ 
rivative  log  gamma  or  digamma  funtion.  Since  the  digamma  function  must  be  approximated,  tliis 
procedure  for  the  general  case  is  not  appealing. 

Using  the  reparametrization,  (2.17).  the  Dirichlet  negative  multinomial  becomes 

n-i  Jt-i 

n  n  (P/  +  rO)  n  (1  -  I  p  +  K)) 

i=l  r  =  0  /•  =  0  y=l 

*  ~  I 

«+  Z  X^-i 

'  n  (1  +  tO) 

r  =  0 

0  <  p,-  <  1,  J  =  1,...  ,A;  -  1  ,  0  >  0  (2.19) 

When  6  =  0  we  have  a  pure  negative  multinomial  distribution,  so  0  again  conveys  departures 
from  negative  multinomial  variation.  In  tliis  case,  due  to  the  constraint  on  the  second  moments 
(i.c.  a*  >  2),  0  <  0  <  0.5  and 

1  -*l'py  +  («-  1)0 


k-  1 

1  -  I  p;  -  20 
y=  1 


*- 1 

.  ,  .  ,  n  +  Z  Xj  -  I 

/p,  Ote)  =  (  y=!  ^  ) 

n  -  1 


The  likelihood  system  of  equations  in  the  DM  (  p,, ... ,  p^  _  j,  0  )  case  becomes 

^  _ L 


m  ' 

-IS 


1 


7=1 


=  0 


i  =  1 . k  -  1 


rl/’ 

r-f  =  V  T 


/e  -  1  -  1 


oO 


t-  1  1=  I  r=0  Hi 


P;  +  rO 


m  •*(*  1 

-I  I  - 

,=  I  ,  =  o  1  _ 


k-\ 


1  -  I  py  +  rO 
7=1 


m  1 

-IS 


=  0 


/=!  r=0  1  +  ^ 

The  likelihood  system  of  equations  in  the  DNM  (  p,, ... ,  p.  _  6  )  case  becomes 

1  m  -r  ■  1 

1  _  T*  ’C‘  » 


(2.20) 


m  '’r  1 

-IS 


7=1 


=  0 


i=  1 . k  -  1 


esf  _ 


k-]X„  -  1 


m  ”i  1 

-  I  I 


cO  P,  +  rO  '='  ^  =  0  ,  ^ 


n,  +  I  Jc,;  -  1 

m  1—1 

-I  I 

/= I  r=0 


'■  =  0 


1  +  rO 


(2.21) 


It  is  easily  seen  that  for  both  (2.20)  and  (2.21)  the  likelihood  equations  do  not  yield  a  solution  in 
closed  form  ;  an  iterative  procedure  is  required.  Once  MUU's  for  (p,  0)  are  obtained,  we  can  easily 
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convert  them  to  MLli's  for  a  .  For  both  distributional  models  we  can  obtain  strai^tforward 
expressions  for  Fisher's  information  matrix  by  taking  a  second  derivative  of  the  likeliliood.  See 
Leeds  (19S7)  for  details.  It  is  important  to  note  that  unequal  «,  can  be  used  in  maximum  likeli¬ 
hood  estimation,  but  it  is  unclear  how  to  proceed  for  moments  estimation.  Ad  hoc  weighting 
proecdurcs  have  been  suggested  for  the  (k  =  2)  case  (see  e.g.  Kleinman  (1970)  ).  lb  allow  com¬ 
parisons  we  will  take  n,  equal  in  our  simulation  studies. 

We  al.so  propose  jackknifuig  of  the  .MLF.  The  jackknife  idea  dates  back  to  Onenouillc 
(1956)  and  is  thorougldy  discussed  in  Efron  (19S2).  We  consider  the  jackknile  procedure  for  two 
purposes.  F-irst,  we  hope  to  obtain  a  bias  reduction,  hence,  a  possibly  better  MSE  performing 
estimator  (see  Schucany,  et  al.  1971).  Second,  we  wish  to  study  the  performance  of  the  jackknife 
estimate  for  possible  confidence  intcr\al  devolopment. 

A 

We  recall  that  given  a  sample  of  size  m  and  a  point  estimate  <1>  of  unknown  parameter  <b, 
the  jackknife  constructs  what  arc  known  as  "pscudovalucs"  which  are  defmed  by  the  relationship 


<I)i  =  '  "  ^>2'  - 


(2.22) 


where  <I*au.  the  original  estimate  with  all  observations  included  and  <!>,,,  is  the  computed  value 
of  the  estimate  with  the  observation  removed.  We  would  then  compute  the  average  and 
standard  deviation  for  the  set  of  pscudovalucs  and  call  them  (1>'  and  s.  respectively.  Flere 


»  1  m  , 

<l>  =  —  y  <1> 


I  (<F,-  -  <I>  )^ 

r?  =  - — 

m  -  1 


(2.23) 


(2.24) 


Our  jackknife  variance  estimate  of  Fur  (<I\ll)  (possibly  of  Fan  ((!>■))  would  be  =  m  '  n- and 
we  would  then  construct  the  confidence  set 


^ALL  ^  ^m  —  ]  ‘ 


(2.25) 


Successful  jackknifing  depends  upon  successful  computation  of  pscudovalucs  emphasizing 
the  need  for  an  effective  iterative  MLE  procedure.  We  develop  such  a  procedure  in  the  next 
section. 
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3.  ITKRAl  IM:  I’ROCKDl  RI:S  FOR  MFK'S 


In  this  section  we  in\estieatc  in  detail  computational  methods  for  obtaining  the  MLF.  When 
majdmizing  the  log  likelihood  function  it  was  proposed  that  \\e  soh  e  the  likelihood  system. 


ML  = 


0 


i  -  1, ...  ,/c  -  1 


M.  = 

cO 


0 


Recalling  that  the  solution  to  these  equations  (if  it  exists)  can  not  be  represented  in  closed  form 
for  the  mixture  cases  discussed,  numerical  methods  sometimes  referred  to  as  "root  finders''  must 
be  considered.  The  most  common  methods  for  determining  the  roots  for  a  system  of  nonlinear 
equations  arc  the  Newlon-Raphson  Method  (NR)  and  the  Method  of  Scoring  (.MS). 


If  we  let  <1>  =  (<h|, ...  ,<!>*)  denote  the  vector  parameter  and  be  the  log  likeliliood 

evaluated  at  <I>  ,  then  the  NR  algorithm  at  iteration  (r+  1)  is  defined  by 

(1)!^+  ')  =  4,''-)  -  r  =  0.1.2. ...  (3.1) 

and  the  MS  algorithm  is  defined  by 

4,'^+  >)  =  4)(0  _  v^|/(4>''^)r  ’  r  =  0,1,2, ...  (3.2) 

In  both  cases  the  non  negative  constant  y.  can  be  thought  of  as  a  damping  term  (  y,  =  1  is  the 
usual  version  ).  D  and  /)-  are  differential  operators  representing  generalized  first  and  second  de¬ 
rivatives  respectively,  and  /(d*''')  is  Fisher's  information  matrix.  We  consider  only  the  NR  algo¬ 
rithm  since  /)-i^(d>''  )  is  available  in  closed  form.  The  MS  algorithm  requires  the  numerical 
computation  of  an  expected  matrix  at  each  stage.  In  fact,  we  use  a  modified  Newlon  Raphson 
approach  to  avoid  the  required  matrix  inversion  in  (3.1)  at  each  iteration.  There  are  many  dif¬ 
ferent  versions  of  the  modified  Newlon- Raphson  method  with  the  least  attention  given  to  the 
simplest  version.  This  version  can  be  constructed  by  computing  only  the  second  derivatives  on 
each  coordinate  separately  and  setting  the  mixed  partials  to  zero. 

Let 

Hu  =  Dl<r{<\>)  (3.3) 

/fij  =  0,  i*  j  =  F  -  .k 

Now 

U~  '  =(DI  2^(d>) )  “  ’  =  — ^-1 -  /  =  1,2 . k  (3.4) 

Hy  '  =  0,  i*  j  =  \ . k 

yielding  the  system  of  equations 

-4- 

C<1>, 


4)  =  4)''' 
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1 


<1> 


(r+  I) 
k 


«!> 


(r+  1) 


<})=  <!>''' 


(3.5) 


which  can  be  thought  of  as  k  versions  of  the  univariate  Neulon-Raphson  method.  Tliis  diagonal 
version,  where  equations  would  be  updated  as  each  new  coordinate  becomes  available,  uill  be 
referred  to  as  the  Modified  Newlon  Raphson  (MNR)  algorithm.  We  remark  that  (1)  Conver¬ 
gence  of  the  NR  or  MNR  algorithm  is  typically  dependent  on  the  starting  solution  fh  '’’  .  (2) 
When  MNR  converges,  it  converges  quickly.  (3)  The  MNR  algorithm  may  not  converge  to  the 
root  even  when  started  'close'  to  <!>’  .  (4)  For  what  follows  we  take  the  parametrization 
=  (P|,  P2.  ••• '  b*  — in  wliich  case  the  MNR  algorithm  becomes 


C^i 


i=  1,2,...  ,k  -  1 


q('-+  1)  _  _  rO 


-2 


cO^ 


For  the  Dirichlct  multinomial  model,  first  partials  are  given  in  (2.19)  with 


-  1 


si  m  ->^1,  ^  I  m  .*14 

.y(<l>)  =  -  S  I  - - - -  I  I  - — - 

op.  l=\  r=0  i^li+rO)^  /=!  r=0  (j_ 


si  m  k  —  \  ^11  ^  _i 

i^((I))  =  -  I  I  I  - —  ^  ^  _ 

f=i(=i  r  =  o  (p,. -t- r0)‘  /=!  f  =  o  +  ^^i 


m  ^ih  '  ,i 

-I  I  - 


m  '’1  * 

+  V  I  - 


;=i 


V.i 


I-]  r=0  (1  +  rO)" 

For  the  Dirichlet  negative  multinomial  model,  first  partials  are  given  in  (2.2f))  with 

si  m  1 


3.^  m  ■'(I 

^  ^^(d')  =  -  I  I 


1 


cp. 


-II 


1 


'  '■  =  0  (p,-  +  Kt)'^  '=1  '•  =  0  (1  _  ^'v  'p  -I-  rO) 


si  m  /[  —  I  .*^1,  '  ^2 

=  -  I  I  I  - 


m  ’  ^2 

-  ^  ^  - k^ 


I- I  1=  I  f  =  o  (p,  +  rO)  f=  I  r-o  (J  _  J.  4.  f^Y 

4-1 

n,+  Z 


m 

+  I 


4-1 

I 


r=  I  r  =  o  (1  +  rO) 


(3.6) 


(3.7) 


(3.8) 


(3.9) 


A  major  problem  with  the  NR  or  MNR  algorithm  is  the  need  for  a  good  starting  solution. 
The  MME  would  seem  to  be  a  reasonable  starting  solution,  but  as  mentioned  before,  there  arc 
many  cases  where  we  cannot  compute  a  moments  estimate.  Even  if  we  can  compute  the  MME, 
what  recourse  do  w  e  have  if  this  starting  solution  causes  the  algorithm  to  diverge? 

The  EM  algorithm  (Dempster,  ct  al.,  (1977)  )  offers  an  alternative  approach  The  algorithm' 
was  originally  proposed  for  the  treatment  of  incomplete  data,  but  can  accommodate  many  other 
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situations.  The  F,M  aJgorithm  generates  from  some  starting  solution  ‘h  ’’  a  sequence  (‘I*  of  es¬ 
timates  in  the  follow  ing  steps 

E-STEI>;  Evaluate  (3,10) 

M-STEP;  Find  (1>  =  to  maximize  (2(<I>,  <1>''^')  (3.11) 

One  of  the  more  appealing  properties  of  the  EM  algorithm  is  that  under  mild  conditions  (see 
e.g.  \Vu,  1983)  each  succesivc  iterate  increases  the  likelihood  tunction. 

When  the  complete  data  likelihood  comes  from  a  regular  exponential  family  represented  hv 

yOi'h)  =  Z)0  )exp(<h  • (3,13) 

where  tO')^  denotes  a  (k  x  1)  'complete  data'  sufiicient  statistic,  the  E  and  M  steps  take  on  a  more 
explicit  form. 


E-STEP  :  Estimate  the  complete  data  sufiicient  statistic  by  finding 

=  £(/0-)  (3.14) 

M-STEP  :  Determine  '>  as  the  solution  of  the  equations 

£(/0)l‘l>)  =  (3.15) 

The  M-STEP  is  a  maximization  step  because  this  condition  must  hold  wlien  obtaining  a  maxi¬ 
mum  under  a  regular  exponential  family  model.  Cycling  back  and  forth  between  E  and  .M  steps 
should  yield  the  MLE:.  when  and  if  the  iterative  sequence  stabilizes.  At  the  MLE  the  following 
relationship  holds: 

£(tO-)U.‘l>*)  =  (3.16) 

This  equality  of  conditional  to  unconditional  expectation  at  the  MLE  has  also  been  noticed  by 
other  authors  (Baum,  ct  al.,  1970,  Orchard  and  Woodbury,  1972,  Sundberg,  1974). 


Under  Diriclilct  mixture  we  have  the  'incomplete  data'  A'  =  {  (at,,  ...  ,x,,))  or  {(x,, ...  ,x,|_|)} 
as  the  case  may  be,  and  the  'complete  data'  E  =  ((x,  ...  ,Xj ,  . . ,£*)}•  Thus 

Afi<l>)  =y(£.Z!hE)  =/(i  !£)•/>(£  la)  (3.17) 

where  y?,y|^)  represents  the  multinomial  or  negative  multinomial  and  D{p\Q.)  represents  the 
Dirichlet  distribution.  Hence  the  distribution  f(y  1  fi>)  is  an  exponential  family  distribution.  If  we 
now  sample  (x,,, ... ,  x,*),  t  =  1,2, ... ,  m  we  have 

m 

n  (/(.v„,...  ,x,;(  I  a,.. .,0^)1  (3.18) 

r=  I 


1  n  /{x,,2,)l 

/=  1 


m  k  „ 

1  n  n  Pu’ 

t=  1 1=  1 


- 1 


1  •  exp{ 


/  =  1  /  = 


I  logpfi)  /  ( 

1 


E(  I  C(;) 

i=] 

n  r(a,) 

i=  1 


where  <b  =  (a,  ...  ,a*)  and  t{y)  =  i{p)  =  Z  logp„  /  =  1 . k. 

1=  1 


It  is  interesting  to  note  that  i(j>)  is  always  the  same  under  Dirichlet  mixture  and  thus  docs 
not  depend  on  the  original  distribution  being  mixed. 


To  put  together  the  E  and  M  steps  we  must  compute 
£(t(p)|<I>)  =  £(  I  logp,i|Q)  and  £(/(p)  | x, <t>)  =  £(  Z  Iogp„  (.x, ,  n 

f=  1  f=  1 


(3.19) 
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n 

Now  £(  I  log p^:  \  a)  = 

m 

I  £(log/i„.|n)  =  w£(log/),, 

la) 

f=  1 

r=  1 

Using  the  expansion 

y=  1  ^ 

after  some  manipulation  yields 

m 

£(  I 
1=  1 

00  ,  J 

log/’iilit)  =  -  I  —  •  n 

j=)J  s=\ 

,  In  -  Q;  5  -  1  , 

la  -1-  j  -  1  ' 

In  order  to  evaluate  E{  I  log/),,  |  jr, ,  a)  we  notice  that  the  distribution  of  /)  is  also 

r  ==  ) 

Dirichlet.  Similar  calculations  to  that  yielding  (3.20)  produces 

l7(0  -  y id)  +  s  -  1 


£(  S  log/)^,  |.Y,,  n) 
;=  1 


m  cc  I  y 

-I  I  —  n 

/  =  1 ;  =  1  7  ^  =  I 


Iy(0  +  i  -  1 


(3.21) 


where  7,(t)  =  A',,-  +  a^,  i  =  1, ...  ,A: 


Observing  (3.15)  and  (3.20)  we  see  that  in  our  ease  the  M-step  does  not  admit  a  close  form  sol¬ 
ution.  'I  he  .\I-step  would  have  to  be  solved  iterati\el\'  witliin  each  iteration.  Instead  we  use  the 
necessars^  condition  (3.16).  Solving  the  stable  point  problem  in  (3.16)  will  be  called  the  modified 
E.\l  algorithm  (or  MEXI  algorithm).  Solving  the  EM  algorithm  will  solve  (3.16).  However, 
solving  (3.16)  does  not  necessarily  provide  the  solution  generated  by  the  E.M  algoritlim.  unless  the 
solution  to  (3.16)  is  unique.  In  general  under  Dirichlet  mixture  the  ME.M  equations  (3.16)  arc 

I  I  I  E(  1  -  pj  \x,,n  -£■(!-  pj  I  a  1  =  0  /  =  1 . k 

i^\j=  1 


When  using  the  EM  algoritlim  to  obtain  an  exact  solution  for  the  Ml.E  it  is  well  known  that 
convergence  to  a  solution  is  extremely  slow.  However.  Rcdner  and  Walker  (1984)  point  out  that 
a  quick  climbing  of  the  likelihood  surface  usually  occurs  in  only  a  few  iicraiions.  We  hope  to 
retain  this  feature  with  our  proposed  MEM  algorithm. 

By  a  quick  inspection  we  can  sec  that  0  =  0  is  a  solution  to  the  MEM  equations  in  both 
cases.  However,  this  result  is  never  acliicvcd  unless  G  =  0  is  used  as  a  starting  solution.  Explicit 
solutions  are  not  available  and  therefore  a  root  finding  method  such  as  the  MNR  method  can  be 
used  here.  1  he  infinite  summations  can  be  truncated  to  obtain  approximate  solutions. 


To  solve  the  MEM  equations  we  would  use  the  (ji ,  0)  parametrization  and  construct  the 
MNR  system  in  the  same  way  it  was  constructed  for  the  orginal  likcliliood  equations.  In  tliis  case 

is  replaced  by  expression  (3.16)  yielding  the  system 


^('■+ 1)  _  £(t0')  I  j:.<E)  -  £(<(j')  |<1>)  I  .  ^  j 

^(£(/0  )U.<l>)  -  £(/0)i<I>)l  ‘E=<E''’ 

c<v 

q()-+  1)  _  qM  _  £(/Cv)  I X.  <!))  -  £(<0’)  |<1>)  I 

!£(/(>■)  I  A-.  <!>)  -  £(tO0  I  •i>}\  ‘J’  =  ‘1’"’ 

To  compute  derivatives  we  use  the  fact  that 

~  exp(  log/i,(<I‘)) 


(3.22) 
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expi  I  log  li^  ((!>)) 

i=  1 


r<l>  - 


1  n  /i,(<i>)  I 

j= 1 


j 

I  I 

j= 1 


c 

r<I> 


/i.  (<1^; 


Wc  can  substitute  the  appropriate  expressions  for  the  derisative  in  the  MNR  system  because  the 
MIZM  equations  are  just  functions  of  products  of  this  type.  For  efficient  computation  of  all 
terms,  recurrence  relationships  for  the  product  functions  and  derivatives  can  be  created  for  in¬ 
creasing  j . 


Finally  wc  state  our  NILE  algorithm.  It  can  be  thouglit  of  as  a  hybrid  algorithm  given  by  the 
following  steps: 


1)  Choose  a  starting  solution  If  exists  then  th'"'  =  If  d)'°'  docs  not  exist  then 

starting  solution  (c,  e,  ... ,  e,  -^)  is  used. 

2)  Iterate  using  the  .MNR  equations  in  (3.7)  with  derivatives  given  by 
expression  (3.8)  or  (3.9j  . 


3a)  If  step  2  yields  a  converging  sequence  {‘F,,))  -*  d>‘  then  <1>mle  =  <F‘. 

3b)  If  step  2  diverges  then  we  run  about  20  ME.M  iterations  starting  at  the 
last  iterate  generated  by  the  previous  MEM  run.  If  ME.M  is  being  run 
for  the  fu-st  time  then  wc  can  start  at  ‘I>'®'.  ME.M  is  intended  to 
point  failed  MNR  starting  solutions  in  the  right  direction.  Return  to  step  2 
after  20  .ME.M  iterations  are  completed. 


In  concluding  this  section  wc  address  the  question  of  whether  the  proposed  MEE  algorithm 
obtains  the  MEE.  'Fo  do  so  wc  investigate  the  likeliliood  surface  and  ask  the  following  questions; 

A)  If  =  0  ,  might  wc  have  obtained  a  minimum  or  saddle  point? 

cS/!’ 

B)  If  =  0  yields  a  maximum,  is  it  a  global  maximum? 

C)  Is  unimodal? 

For  the  if  functions  being  considered.  (C)  can  not  be  answered  anahnically  in  the  general  case. 
Levin  and  Reeds  (1977)  have  shown  that  if  p,, ... ,  ,  are  known  and  0  is  unknown,  then  if 

has  at  most  one  mode.  This  result  suggests  that  urumodality  may  be  preserved  even  if  the  p/s 
arc  unknown. 

Assuming  that  unimodahty  cannot  be  verified,  we  must  address  case  (B),  the  arrival  at  a  local 
maximum.  The  class  of  iterative  algorithms. 

>)  =  (lyW  -  A/-'  .  (-4^)1  r  =  0,1,2, ...  (3.23) 

includes  both  MNR  and  MEM.  Using  (3.23)  along  with  a  Taylor  expansion  for  if(<F‘''*’ ")  at 
<!><'•  wc  obtain 

if  (<f'"  =  i^(<l)^^>)  -  •  {M~  ' )  •  (-^)  I  (3.24) 

C<1>  '  ^  C7<P  I  (J)  _  (j,.» 

If  A/„,  is  positive  (or  negative)  definite,  then  the  iterative  algorithm  (3.30)  is  a  descent  (or  ascent) 
algorithm.  For  simplicity  we  will  call  this  type  of  algonthm  a  monotone  algorithm.  Under  a 
monotone  algorithm,  the  answer  to  (B)  is  yes.  For  the  MNR  algorithm,  this  requires  that  all 
diagonal  elements  of  arc  positive. 

It  is  interesting  to  note  that  the  choice  of  reparametrization  from  n  to  (p ,  0)  makes  a  dif¬ 
ference  in  the  shape  of  the  likelihood  and  in  terms  of  the  behavior  of  the  iterative  algorithm.  This 
property  reinforces  the  use  of  "stable  parameters"  as  justified  in  a  slightly  diflercnt  manner  by  Ross 
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(1970).  More  precisely,  we  notice  that  the  ^  function  for  both  Diriclilct  multinomial  and 
Diriclilet  negati\’c  multinomial  has  the  following  form  under  the  (a,, ... ,  a*)  paramclrization 


^  -  c(Xi)  +  X  S  Z  log(a,-  +  r)  -  X  X  log(Xa  +  r) 

lit  I  r 


If  we  now  look  at 


ca,- 


i=  1,2,  ...,^ 


w^e  hav! 


sLLL^=  5:yv__L_-vr_J_ 

‘  ~  (a,  +  r)"  ^  (Xa  +  rf 


i= 


ca. 


From  this  relation  (for  all  i)  it  is  unclear  what  the  likelihood  surface  might  look  like  and  whether 
we  would  obtain  a  monotone  algorithm.  Figure  1  illustrates  the  behavior  of  the  function  for 
/:  =  2,m=  10,  n  =  40  based  upon  the  sample  x„  ...,  x,|)  =  23,31,1,1,3,34,17,32,31,8. 


iMiasuKOCDsaaua 


9.75 


Figure  1.  Log  Likelihood  function  under  the  original  parametrization 


Under  the  (p,  ... ,  _  i,  0)  parametrization  i?  has  the  form 

Se  =  ci£,)  +  XII  Iog(p,  +  rO)  -  XI  log(I  -  *X  'p;  +  H3)  “  1 1  log(l  +  rO) 

I  i  r  t  r  J  ,  r 

Now  if  w’c  examine  the  second  order  partial  derivatives  on  the  p,  coordinates 
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=  ISI- 


(Mi  + 


+  II- 


>  0  i=  I,...,  k 


(1  -  I  Hi  +  rO  f 


so  the  log  likelihood  is  concave  in  all  ji  directions.  Figure  2  illustrates  this  behavior  and  we  can 
also  see  that  that  ^  appears  smoother  under  the  (y  ,  0)  parametrization.  To  assess  whether  the 

d^(  - 

iterative  algorithm  is  monotone  we  need  onlv  examine  whether - -  >  0 


-194.09 


-294.12 


-394.14 


-494.17  ■ 
39. 


Figure  2.  Log  Likelihood  function  under  the  reparametrization 


4.  A  SIMULATION  STUDY 


An  extensive  simulation  study  was  undertaken  to  compare  the  method  of  moments  estima¬ 
tors,  the  .MLE's  and  the  jackknifed  MLE's  (JMLE).  Data  was  simulated  using  I.VISL  routines 
GGA.MR  and  GGUBS.  Roughly  5000  repbeations  were  used.  First,  we  compare  MME's  with 
MLE's.  It  is  expected  that  the  MLFi  will  perform  better,  and  in  fact,  this  is  case,  often  substan¬ 
tially  so.  For  both  the  DM  and  DNM  case,  the  parameters  to  vary  are 

k:  dimension  of  the  parameter  space 
m:  the  sample  size 

n;  generalized  BemouUi  trial  stopping  parameter 
and  a  :  the  Dirichlet  parameter  vector 
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\Vc  present  results  for  few  of  the  k  =  2.3,6  .  m  =  10.40  and  n  =  40,60  eases  in  the  following  ta¬ 
bles.  (See  Leeds  (I'^STi  for  additional  simulation  work).  For  each  estimator  under  each  specifi¬ 
cation  we  can  compute  (i)  bias,  (ii)  mean  squared  error,  MSL,  (iiij  quadratic  expected  loss  using 
information  weighted  loss,  QEL.  (iv)  the  exact  covariance  matrix,  and  (v)  the  inverse  of  Fisher's 
information  matrix.  BIASM,  .\1SL.\1,  OCL.M  arc  for  the  M.ML;  Bl.AS,  MSL,  QFL  are  for  the 
MLL's.  Since  we  know  the  true  n,  we  can  compute  the  asymptotic  covariance  of  the  MLE's 
which  is  '  /~'(n)  to  compare  with  the  exact  covariance  at  the  fixed  m.  This  is  of  interest 
since  we  do  not  know  whether  the  MLE's  for  these  Dirichlet  mixed  models  arc  asymptotically 
efficient.  Generally,  we  can  not  verify  the  usual  regularity  conditions  (see  Eclimann  (1980)). 

In  the  last  few  tables  we  present  simulation  results  again  based  on  5000  replications  for  a  few 
cases  (k  =  2)  comparing  the  JMLE  with  the  NILE. 


In  summary'  ; 


(i)  Small  sample  case  m=  10.  For  both  D.M  and  DNM  cases,  a  feasible  .M.ME  provided  an  ac¬ 
curate  starting  solution  for  the  MNR  algorithm.  However,  6  -♦  0  or  increasing  k  were  more  de¬ 
pendent  on  .MEM  backup.  When  MME's  did  not  exist,  the  starting  solution  was  chosen  to  be 

4)'®'  =  (£,c, ...  ,c,-|-) 
k 

This  solution  performed  admirably  when  the  MME  did  not  exist.  The  asymptotic  covariance 
approximation  is  poor. 


(ii)  Large  Sample  case  m  =  40.  For  both  DM  and  DN.M  compounds,  the  moments  estimate  was 
always  feasible  and  provided  a  good  starting  solution  for  the  MNR  algorithm.  'Fhis  comes  as  no 
surprise  since  the  M.ME's  arc  consistent.  However,  if  we  were  to  receive  a  set  of  data  with  une¬ 
qual  rt/s  ,  we  would  not  have  a  MME  to  start  the  MNR  algorithm.  The  starting  value  in  (i) 
should  suffice.  Here  the  asymptotic  covariance  approximation  seems  more  reasonable. 


(iii)  For  the  k  =  2  case  Shenton  (1950)  reports  that  the  elficiency  of  the  MME  to  MLE's  is  at 
least  70%  .  This  result  is  not  contradicted.  If  we  happen  to  be  in  this  case  we  do  not  lose  much 
by  using  the  .M.ME.  However,  for  larger  k  there  appears  to  be  a  rather  dramatic  reduction  is  ef¬ 
ficiency  of  MME's  compared  to  MLE's. 


(iv)  JMLE's  performed  exceptionally  well  in  comparison  to  the  NILE  in  terms  of  reducing  mean 
squared  error.  This  result  is  encouraging  for  the  reason  that  the  evaluation  of  the  JMLE  makes 
the  most  use  of  the  MLE  algorithm  to  ensure  the  successful  computation  of  all  pseudo  values, 
and  thus  the  JMLE  itself.  However,  it  should  be  mentioned  that  with  frequent  use  of  the  .MLE 
algorithm,  convergence  may  be  a  problem,  especially  in  the  small  sample  case.  For  instance,  if 
m  =  10  we  compute  each  pseudo  value  on  nine  observations.  This  is  a  10%  reduction  in  the 
amount  of  data  considered.  For  the  large  sample  case  this  would  not  be  so. 
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Dime  I  ILF.  T-MLL  TISOS II A  L  TA  BL  FS 


Qj  m 

n 

1 

1  10 

40 

Method  of  Moments 

Maximum  Ukclihood 

Eishcr's 

Fisher's 

Covariance  Matrix 

Covariance  Matrix 

Information 

Inverse 

1.513510  1.057329 

1. 3989  52  0.991609 

8.951  -6.2) 

.215  .149 

1.057329  1.239158 

0.991609  1.168164 

■6.20  8.951 

.149  .215 

BI.ASM  BI.AS 

.MSEM 

.VISE 

Ol 

0.414802  0.443557 

1.685580  1.5956954 

^2 

0.400124  0.429745 

1.399257  1.3528449 

.MSEM  =  3.084837 

MSE 

=  2.948540 

OEL.Vl  =  24.835702 

OEL 

=  24.125809 

Ol 

02 

m 

n 

1 

1 

40 

40 

.Method  of  Moments 
Covariance  Matrix 

Maximum  Likelihood  Fisher's  Fisher's 

Covariance  Matrix  Information  Inverse 

0.074439  0.053271 
0.053721  0.073378 

0.068876 

0.047896 

0.047896 

0.067768 

35.802  -24.8  .054  .037 
-24.83  35.80  .037  .054 

Oi 

0: 

BIASM 

0.060176 

0.060324 

BIAS 

0.065845 

0.066177 

MSEM  MSE 

0.078060  0.0732114 

0.077017  0.0721478 

.MSEM 

QELM 

=  0.155077  MSE  =  0.145359 

=  52.3832'’5  QFL  =  52.261535 
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n,  Qj  m  n 

15  10  40 


Method  of  Moments 
Covariance  Matrix 

Maximum  l  ikelihood  Fisher's  Fisher's 

Ciovariance  .Matrix  Information  Inverse 

1.469407  7.120741 

1.240462  5.880053 

11.04-1.593  .027  1.244 

7.120741  50.092428 

5.880053  40.67265 

-1.59  11.04  1.244  8.617 

BL\SM  Bl.AS 

.MSFM  MSE 

Ol 

0.603724  0.476951 

1.8.34133  1.4679439 

3.684889  2.981112 

63.678252  49.5596842 

MSF.M  =  65.512.385 

MSE  =  51.027628 

OFLM  =  28.428756 

OEL  =  26.017101 

a, 

Qj  m 

n 

1 

5  40 

40 

.Method  of  Moments 

.Maximum  Likelihood  Fisher's 

Fisher's 

Covariance  Matrix 

Covariance  Matrix 

Information 

Inverse 

0.122884  0.601518 

0.092132  0.441677 

44.16  -6.37 

.068  0.311 

0.601518  4.082731 

0.441677  3.162800 

-6.374  1.384 

.311  2.155 

BIASM  BI.AS 

.MSEM 

MSE 

Ol 

0.114245  0.085229 

0.1.359.36  0.0993962 

a: 

0.685972  0.547638 

4.553290  3.4627071 

MSEM  =  4.689226 

MSE 

=  3.562103 

QELM  =  67.3M3.33 

OEL 

=  66.682778 
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a,  Qj  Qj  m  n 

111  10  60 


Method  of  Moments 

Mawmum  Likelihood  Fisher's 

Fisher's 

Co\arianee  Matrix 

Covariance  .Matrix 

Information 

Inverse 

0.653  0.428  0.415 

0.335  0.188  0.187 

11.13  -3.789 

0.138  .072 

0.428  0.714  0.470 

0.188  0.334  0.200 

11.13 

0.138 

0.415  0.470  0.693 

0.187  0.200  0.339  - 

3.789  11.13 

.072  0.138 

BIASM  BIAS 

MSEM 

MSE 

Oi 

0.474866  0.242310 

0.879144  0.39.37210 

0.475535  0.240140 

0.939910  0.3910328 

Oj 

0.475880  0.247340 

0.919547  0.4004014 

MSFM  =  2.738601 

MSI 

■  =  l.!860.‘i6 

OFLM  =  62.345719 

OEI 

=  45.055172 

a,  Oj 

1  1 

Qj  m 

1  40 

D 

60 

Method  of  Moments  .Maximum  Likelihood  Fisher's  Fisher's 

Covariance  Matrix 

Covariance  Matrix 

Information  Inverse 

0.056  0.032  0.032 

0.042  0.022  0.022 

44.52  -15.16  .035  .018 

0.032  0.056  0.033 

0.022  0.040  0.022 

44.52  .035 

0.032  0.033  0.058 

0.022  0.022  0.041  - 

15.16  44.52  .018  .035 

BIASM  BIAS 

MSE.M  MSE 

Ol 

0.079744  0.048335 

0.062908  0.0446581 

0.078848  0.048255 

0.061938  0.0421660 

a. 

0.078873  0.047966 

0.063734  0.0437279 

MSEM  =  0.188580 

MSE  =  0.130552 

QELM  =  95.803 1 4f) 

OFF  =  94.523415 
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a,  Oj  Qj  m  n 

1  3  5  10  60 


Method  of  Moments 

Maximum  likelihood  Fisher's  Fisher's 

Covariance  Matrix 

Covariance  Matrix 

Information  In\er.se 

0.938  2.293  3.884 

0.461  1.063  1.815 

11.5  -1.02  .164.308.543 

2.293  10.12  14.89 

1.063  4.757  6.833 

2.36  .308  1.43  2.00 

3.884  14.89  27.13 

1.815  6.833  12.75 

1.02  .895  .543  2.00  4.05 

BIASM  Bl.AS 

MSFM  MSF 

cti 

0.622727  0.297014 

1.325775  0.5488681 

2.055143  1.002653 

14.342457  5.7625477 

3.478274  1.721109 

39.231529  15.715182 

MSFM  =  54.899761 

MSF  =  22.026598 

OFLM  =  66.288906 

OFF  =  56.062711 

a,  Oj 

1  3 

Qj  m 

5  40 

n 

60 

Method  of  Moments  Maximum  Fikelihood  Fisher's  Fisher's 

Covariance  Matrix 

Covariance  Matrix 

Information  Inverse 

0.065  0.134  0.232 

0.049  0.098  0.171 

45.9  -4.12  .041.077.136 

0.134  0.620  0.887 

1.063  4.757  6.833 

9.42  .077  .359  .501 

0.232  0.887  1.674 

1.815  6.833  12.75 

-4.12  3.58  .136.501  1.01 

BI.ASM  BIAS 

MSFM  MSF 

cii 

0.098757  0.048925 

0.074585  0.0510555 

a, 

0.327434  0.180411 

0.726720  0.4933342 

a. 

0.544873  0.299283 

1.971414  1.. 3429237 

MSFM  =  2.772719 

.MSF  =  1.8873134 

QFLM  =267.365122 

OFF  =  265.9265337 
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a,  Oj  Uj  a,  Oj  Cj 

111  111 


Fisher's  Information 


11.89  -1.66  -1.66  -1.66  -1.66  -1.66 
-1.66  11.89  -1.66  -1.66  -1.66  -1.66 
-1.66  -1.66  11.89  -1.66  -1.66  -1.66 
-1.66  -1.66  -1.66  11.89  -1.66  -1  66 
-1.66  -1.66  -1.66  -1.66  11.89  -1.66 
-1.66  -1.66  -1.66  -1.66  -1.66  11.89 


Method  of  .Moments 
Covariance  Structure 


1.10S4  .6072  .5829  .6386  .6566  .6053 
.6072  1.0835  .5656  .6144  .6305  .6134 
.5839  .5656  1.0680  .6068  .6242  .6028 
.6386  .6144  .6068  1.1713  .6543  .6308 
.6566  .6305  .6242  .6543  1.1760  .6602 
.6053  .61.34  .6028  .6308  .6602  1.1105 


Information  Inverse 


.107984  .034179 

.107984 
.107984 
.107984 
.107984 

.034179  .107984 


.Ma.ximum  Likelihood 
Covariance  Structure 


.1865  .0721  .0670 
.0721  .1912  .0698 
.0670  .0698  .1858 
.0718  .0726  .0715 
.0730  .0746  .0717 
.0678  .0732  .0681 


.0718  .0730  .0678 
.0726  .0746  .0732 
.0715  .0717  .0681 
.1880  .0702  .0680 
.0702  .1904  .0695 
.0680  .0695  .1809 


BIAS.M 

1.078768 

1.088996 

1.077059 

1.082371 

1.093295 

1.074797 


BIAS 

0.177409 

0.182866 

0.170849 

0.179794 

0.175182 

0.183474 


.MSF,M 

2.292481 

2.389933 

2.248338 

2,236290 

2.392863 

2.285986 


MSB 

0.2179630 

0.2245902 

0.2150239 

0.2203275 

0.2211166 

0.2146019 


MSEM  =  13.8728295 
OFiLM  =  124.469340 


MSE  =  1.3136232 
QFL  =60.31711222 
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a,  Qj  Qj  a,  m  n 

111115  10  60 


Fisher's  Information 

Information  Inverse 

11.29  -.90778 

12301  .04016  . 26463 

11.29 

04016  .12301 

11.29 

11.29 

11.29 

.12301 

-.90778  .9853 

.26463  .04016  2.23381 

Method  of  .Moments 

Maximum  Likelihood 

Covariance  Structure 

Covariance  Structure 

2.056 

.2401 

1.222  1.9842 

.0968  .2347 

1.210  1.1634  1.945 

.0944  .0909  .2260 

1.223  1.2689  1.166  2.086 

.0972  .0968  .0877  .2332 

1.213  1.2033  1.167  1.222  1.949 

.0976  .0980  .0881  .0914  .2281 

7.610  7.3565  7.232  7.709  7.333  49.24 

.6347  .6131  .5841  .6212  .6067  4.8 

Bl.ASM 

BIAS 

MSE.M  MSE 

a,  1.394611 

0.203828 

4.00442  0.2816740 

Oj  1.383026 

0.205197 

3.89698  0.2768250 

Oj  1.378675 

0.198051 

3.84595  0.2652151 

a,  1.380749 

0.199832 

3.99241  0.2733169 

Oj  1.381548 

0.200946 

3.85799  0.2684726 

a,  7.250609 

6.082229 

101.81507  6.0822289 

MSEM  = 

121.41281  MSE  =  7.4477325 

OELM  = 

169.84660  OEL  =  74.8955841 
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a,  Uj  Oj  a,  m  n 

1  1  1  3  3  3  10  60 


Fisher's  Information 

Information  Inser'-e 

10.92  -.72916 

.1265  .0407  .04')' . 

10.92 

.04(17  ,1265  ,(1407  .147374 

10.92 

.0407  .0407  .12^5 

2.486 

,845  .534  .5.34 

2.4S6 

.  .147.374  ,5.34  ,845  ..S.M 

-.72916  2.486 

. 5.34  .534  .845 

Method  of  Moments 

Maximum  Likelihood 

Covariance  Structure 

(,'ovariance  Structu: 

2.211 

.2296 

1.352  2.1805 

.093!  .2498 

1.428  1.3051  2.285 

.0925  .0932  .2475 

4.989  4.7342  4.947  19.9' 

.3397  .3551  .3415  1.82(j 

4.71S  4.5615  4.6S4  16,58  18.61 

.3215  ..3423  .3179  1.216  7 

4.924  4.6910  4.815  17.07  16.41 

19.39  .3368  .3546  .3391  1.272  1.215  1.8 

BI.ASM 

BI.AS  .MSFM  MSE 

a,  1.391406 

0.202.M7  4.1474.3  0.2706460 

Qj  1.387812 

0.2OSI58  4.10649  0.2931731 

Qj  1. .39(1962 

0.205864  4.21970  0.28986.37 

a,  4.3341.36 

0.6735(»O  38.75543  2.273.3703 

Cj  4.269258 

0.649777  36.84.389  2.1189765 

Qj  4.345345 

0.682773  38.27363  2.2626982 

MSEM 

=  I26..34657  MSIi  =  7.50872? 

QUI.M 

=  192.06151  gri  =  94.784289 
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DllilCHLI.T-SIXiA  TIVE  MI  LTISOMIAL  TABLES 


Method  of  MoinctUs  Maximum  Likelihood  Fisher's  Fisher's 
Covariance  Matrix  Covariance  Matrix  Information  lincrse 


2.3209  6.7355 

6.7355  33.2497 


1.3750  4.2090 

4.2090  18.9501 


10.71  -2.604  .239  0.601 
-2.60  1.037  .601  2.474 


BI.AS.M  BIAS  MSE.M  MSE 
1.396165  0.4S5078  4.270152  1.6103188 

4.011426  1.791021  49.341280  22.1579009 


MSE.M  =  53.611432 
QEE.M  =  48.274695 


MSE  =  23.768220 
OEL  =  29.4002.M 


.Method  of  Moments  .Maximum  Likelihood  Fisher's  Fisher's 
Covariance  Matrix  Covariance  Matrix  Information  Inverse 


0.IS39  0.4094 

0.4094  1.6422 


0.0823  0.2195 

0.2195  0.9222 


42.86  -10.42 
-10.417  4.149 


.060  0.150 
.150  0.618 


BI.\S.M 

0.511964 

1.242014 


BL-\S 

0.077721 

0.292713 


.MSE.M  MSE 
0.446043  0.0883497 
3.184778  1.0078931 


MSEM  =  3.63082 
QELM  =  79.22263 


MSE  =  1.096243 
OEL  =  67.379283 


Cll 

1 

02 

5 

m 

10 

n 

30 

Method  of  Moments 

Maximum  Likelihood  Fisher's 

Eishcr's 

Covariane 

e  .Matrix 

Covarianec  .Matrix 

Infomiation 

Inverse 

3.0960 

5. 8208 

1.9116 

10.1158 

10.487  -1.57 

0.2899  1.2976 

15,S20S  1 

15.0248 

10.1158 

73.1788 

-1.572  0.351 

1.2976  8.6557 

BI.ASM 

BIAS 

MSEM 

MSE 

Ol 

1.239984 

0.575918 

4.634425  2.2432696 

6.766865 

3.659899  160.815227  86.5736542 

MSEM 

=  156.4496 

MSE 

=  88.816923 

OELM 

=  89.2794 

OEL 

=  53.809407 

Oi 

1 

02 

5 

m 

40 

n 

30 

Method  of  Moments  Maximum  Likelihood  Fisher's  Fisher's 

Covariance  Matrix  Covariance  Matrix  Information  Inverse 

0.2066 

0.9496 

0.1074 

0.5715 

41.949  -6.29  0.0725.32440 

0.9496 

5.9745 

0.5175 

3.5226 

-6.289  1.405  0.3244  2.1639 

BIASM 

BIAS 

MSE.M  MSE 

Oi 

0,.344309 

0.098942 

0.3251636  0.1171760 

02 

1.657210 

0.597938 

8.7208346  3.8801412 

.MSEM 

=  9.04599  MSE  =  3.9973172 

QELM 

=  69.6492 

5  OFE  =  65.9877250 
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Method  of  Moments 

Maximum  1  ikelihood  Fisher's  Fisher's 

Covariance  .Matrix 

(.'oxariance  .Matrix  Infonnaiion  Inverse 

11.891  8.646  '’.3191 

3.417  2  8612  824  245  -1.02 

-1.02  .923.635.6051 

8.646  11.07  7.16.M 

2.8613,4112.812  -1.02  2.452 

-1.02  035.923  .6051 

7.319  7.163  8.9517 

2.824  2.812  3.438  -1.02  -1.02 

2,622  .605.605.8513 

BIASM  BIAS  MSEM 

MSE 

3,989816  OSS’:??  27.810048 

4.2047086 

3,927451  0.S'’3922  26.493368 

4.1745338 

03 

3.072373  0.S7U821  18.391223 

4.1963659 

MSFM  =  72.694639 

MSE  =  12.575608 

OEl.M  =  107.797086 

OEL  =  64.542774 

Mctliod  of  Moments 
Cotariance  Matrix 


0.226  0,523  0.8543 
0.523  2.202  3.0777 
0.854  3.077  5.2502 


Maximum  Likelihood  Fisher's  Fisher's 
Covariance  .Matrix  Information  ln\’erse 


0.122  0.295  0.485 
0.295  1.167  1.685 
0.4S5  1.685  2.988 


30,72  -3.19  -3.19  .076  .161  .2611 
-3.19  6.668  -3.19  .161  .657  .8982 
-3.19  -3.19  2.906  .261  .898  1.617 


BIASM  BIAS  MSEM  MSE 
0.401361  0.1 1 1901  0.3870097  0.134638! 
1,252126  0.367740  3.7695374  1.3020395 
1.820276  0.601499  8.5636105  3.3473450 


MSEM  =  12.720158 
QE'L.M  =  157.781792 


MSE  =  4.780227 
PEL  =  151.413406 


a,  Qj  Oj 

111  115 

m 

10 

n 

10 

Fisher's  Infonnation 

Infonnation  Inverse 

7.7653  -0.71235 

.20614  .08818  ..  .08818  .49059 

7.7653 

.08818  .20614  .08818  .49059 

7.7653 

7.7653 

7.7653 

.08818  .08818  .20614  .49059 

-0.71235  0.81150 

.49059  .49059  .  3.38551 

Method  of  .Moments 

Maximum  Likelihood 

Covariance  Structure 

Covariance  Structure 

127.6 

2.462 

103.0  138.50 

2.049  2.394 

90.3  89.87  103.3 

2.618  2.556  3.621 

99.9  99.78  90.7  124.2 

1.573  1.524  1.925  1.712 

95.4  95.02  87.5  93.8  111.6 

3.642  3.471  4.537  2.2.54  7.524 

436.8  447.59  404.9  446.2  41 1.5  2054. 

17.21  16.64  21.57  12.06  31.3  146 

BIASM 

BIAS 

MSBM  MSB 

a,  5.930699 

0.444315 

162.82029  2.6592233 

Oj  5.771361 

0.413110 

171.86984  2.5647757 

Qj  5.736776 

0.436712 

136.22782  3.8115527 

a,  5.856497 

0.414522 

158.52570  1.8840746 

Oj  5.856497 

0.443810 

I44.I250I  7.7205665 

25.436602 

2.407225  2701.14373  151.8.331676 

MSFM  = 

3473.71240  MSB  =  170.473361 

OFLM  = 

231.3.20929  QBL  =  129.862213 
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JACKKMin  RIISLLTS  (DIRICHULT-ML  ITISOMIAL  TABUS  k  =  2) 


1 

1 

m  n 

10  40 

Biases 

JMLl- 

MLF 

Ol 

-0.2265 

.4275 

a. 

-0.2248 

.4161 

Mean  Squared  F.rror 

J.MLF, 

.MLE 

«! 

0.3729 

1.324 

a. 

0.355U 

1.153 

1 

“2 

5 

tn  n 

10  40 

Biases 

JMLE 

MLE 

«! 

-0.4358 

0.5030 

£<2 

-2.4852 

3.1318 

Mean  Squared  Error 

JMLE 

MLE 

Ol 

0.3714 

1.743 

^2 

10.9462 

63.054 
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a,  aj  m  n 

0.5  I  10  10 


Biases 

JMLB 

MLE 

Ol 

-0.2040 

.2425 

02 

-0.4619 

.6049 

1 

Mean  Squared  Error 

JMLE 

MLE 

Ol 

o.osoo 

0.4546 

02 

0.4322 

3.1011 

0| 

0.5 

Oj 

0.5 

nj  n 

10  10 

Biases 
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